the School of Computer Science and Engineering, Nanyang Technological University
Abstract:Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12dB and 33.1% at 10dB, respectively.
Abstract:Synthetic video generation with foundation models has gained attention for its realism and wide applications. While these models produce high-quality frames, they often fail to respect common sense and physical laws, resulting in abnormal content. Existing metrics like VideoScore emphasize general quality but ignore such violations and lack interpretability. A more insightful approach is using multi-modal large language models (MLLMs) as interpretable evaluators, as seen in FactScore. Yet, MLLMs' ability to detect abnormalities in synthetic videos remains underexplored. To address this, we introduce VideoHallu, a benchmark featuring synthetic videos from models like Veo2, Sora, and Kling, paired with expert-designed QA tasks solvable via human-level reasoning across various categories. We assess several SoTA MLLMs, including GPT-4o, Gemini-2.5-Pro, Qwen-2.5-VL, and newer models like Video-R1 and VideoChat-R1. Despite strong real-world performance on MVBench and MovieChat, these models still hallucinate on basic commonsense and physics tasks in synthetic settings, underscoring the challenge of hallucination. We further fine-tune SoTA MLLMs using Group Relative Policy Optimization (GRPO) on real and synthetic commonsense/physics data. Results show notable accuracy gains, especially with counterexample integration, advancing MLLMs' reasoning capabilities. Our data is available at https://github.com/zli12321/VideoHallu.
Abstract:With the rapid development of artificial intelligence, intelligent decision-making techniques have gradually surpassed human levels in various human-machine competitions, especially in complex multi-agent cooperative task scenarios. Multi-agent cooperative decision-making involves multiple agents working together to complete established tasks and achieve specific objectives. These techniques are widely applicable in real-world scenarios such as autonomous driving, drone navigation, disaster rescue, and simulated military confrontations. This paper begins with a comprehensive survey of the leading simulation environments and platforms used for multi-agent cooperative decision-making. Specifically, we provide an in-depth analysis for these simulation environments from various perspectives, including task formats, reward allocation, and the underlying technologies employed. Subsequently, we provide a comprehensive overview of the mainstream intelligent decision-making approaches, algorithms and models for multi-agent systems (MAS). Theseapproaches can be broadly categorized into five types: rule-based (primarily fuzzy logic), game theory-based, evolutionary algorithms-based, deep multi-agent reinforcement learning (MARL)-based, and large language models(LLMs)reasoning-based. Given the significant advantages of MARL andLLMs-baseddecision-making methods over the traditional rule, game theory, and evolutionary algorithms, this paper focuses on these multi-agent methods utilizing MARL and LLMs-based techniques. We provide an in-depth discussion of these approaches, highlighting their methodology taxonomies, advantages, and drawbacks. Further, several prominent research directions in the future and potential challenges of multi-agent cooperative decision-making are also detailed.
Abstract:The emergence of distributed Mixture-of-Experts (DMoE) systems, which deploy expert models at edge nodes, offers a pathway to achieving connected intelligence in sixth-generation (6G) mobile networks and edge artificial intelligence (AI). However, current DMoE systems lack an effective expert selection algorithm to address the simultaneous task-expert relevance and channel diversity inherent in these systems. Traditional AI or communication systems focus on either performance or channel conditions, and direct application of these methods leads to high communication overhead or low performance. To address this, we propose the DMoE protocol to schedule the expert inference and inter-expert transmission. This protocol identifies expert selection and subcarrier allocation as key optimization problems. We formulate an expert selection problem by incorporating both AI performance and channel conditions, and further extend it to a Joint Expert and Subcarrier Allocation (JESA) problem for comprehensive AI and channel management within the DMoE framework. For the NP-hard expert selection problem, we introduce the Dynamic Expert Selection (DES) algorithm, which leverages a linear relaxation as a bounding criterion to significantly reduce search complexity. For the JESA problem, we discover a unique structural property that ensures asymptotic optimality in most scenarios. We propose an iterative algorithm that addresses subcarrier allocation as a subproblem and integrates it with the DES algorithm. The proposed framework effectively manages the tradeoff between task relevance and channel conditions through a tunable importance factor, enabling flexible adaptation to diverse scenarios. Numerical experiments validate the dual benefits of the proposed expert selection algorithm: high performance and significantly reduced cost.
Abstract:Wireless signal recognition (WSR) is a crucial technique for intelligent communications and spectrum sharing in the next six-generation (6G) wireless communication networks. It can be utilized to enhance network performance and efficiency, improve quality of service (QoS), and improve network security and reliability. Additionally, WSR can be applied for military applications such as signal interception, signal race, and signal abduction. In the past decades, great efforts have been made for the research of WSR. Earlier works mainly focus on model-based methods, including likelihood-based (LB) and feature-based (FB) methods, which have taken the leading position for many years. With the emergence of artificial intelligence (AI), intelligent methods including machine learning-based (ML-based) and deep learning-based (DL-based) methods have been developed to extract the features of the received signals and perform the classification. In this work, we provide a comprehensive review of WSR from the view of applications, main tasks, recent advances, datasets and evaluation metrics, challenges, and future directions. Specifically, intelligent WSR methods are introduced from the perspective of model, data, learning and implementation. Moreover, we analyze the challenges for WSR from the view of complex, dynamic, and open 6G wireless environments and discuss the future directions for WSR. This survey is expected to provide a comprehensive overview of the state-of-the-art WSR techniques and inspire new research directions for WSR in 6G networks.
Abstract:Generative AI (GenAI) is driving the intelligence of wireless communications. Due to data limitations, random generation, and dynamic environments, GenAI may generate channel information or optimization strategies that violate physical laws or deviate from actual real-world requirements. We refer to this phenomenon as wireless hallucination, which results in invalid channel information, spectrum wastage, and low communication reliability but remains underexplored. To address this gap, this article provides a comprehensive concept of wireless hallucinations in GenAI-driven communications, focusing on hallucination mitigation. Specifically, we first introduce the fundamental, analyze its causes based on the GenAI workflow, and propose mitigation solutions at the data, model, and post-generation levels. Then, we systematically examines representative hallucination scenarios in GenAI-enabled communications and their corresponding solutions. Finally, we propose a novel integrated mitigation solution for GenAI-based channel estimation. At the data level, we establish a channel estimation hallucination dataset and employ generative adversarial networks (GANs)-based data augmentation. Additionally, we incorporate attention mechanisms and large language models (LLMs) to enhance both training and inference performance. Experimental results demonstrate that the proposed hybrid solutions reduce the normalized mean square error (NMSE) by 0.19, effectively reducing wireless hallucinations.
Abstract:Recent advancements in large language models (LLMs) have led to their widespread adoption and large-scale deployment across various domains. However, their environmental impact, particularly during inference, has become a growing concern due to their substantial energy consumption and carbon footprint. Existing research has focused on inference computation alone, overlooking the analysis and optimization of carbon footprint in network-aided LLM service systems. To address this gap, we propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain, including computational inference and wireless communication. Furthermore, we formulate an optimization problem aimed at minimizing the overall carbon footprint, which is solved through joint optimization of inference outputs and transmit power under quality-of-experience and system performance constraints. To achieve this joint optimization, we leverage the energy efficiency of spiking neural networks (SNNs) by adopting SNN as the actor network and propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL). Comprehensive simulations demonstrate that SDRL algorithm significantly reduces overall carbon footprint, achieving an 18.77% reduction compared to the benchmark soft actor-critic, highlighting its potential for enabling more sustainable LLM inference services.
Abstract:This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.
Abstract:Low-Altitude Economy Networks (LAENets) have emerged as significant enablers of social activities, offering low-altitude services such as the transportation of packages, groceries, and medical supplies. Unlike traditional terrestrial networks, LAENets are characterized by control mechanisms and ever-changing operational factors, which make them more complex and susceptible to vulnerabilities. As applications of LAENet continue to expand, robustness of these systems becomes crucial. In this paper, we investigate a novel application of Generative Artificial Intelligence (GenAI) to improve the robustness of LAENets. We conduct a systematic analysis of robustness requirements for LAENets, complemented by a comprehensive review of robust Quality of Service (QoS) metrics from the wireless physical layer perspective. We then investigate existing GenAI-enabled approaches for robustness enhancement. This leads to our proposal of a novel diffusion-based optimization framework with a Mixture of Expert (MoE)-transformer actor network. In the robust beamforming case study, the proposed framework demonstrates its effectiveness by optimizing beamforming under uncertainties, achieving a more than 44% increase in the worst-case achievable secrecy rate. These findings highlight the significant potential of GenAI in strengthening LAENet robustness.
Abstract:Integrated sensing and communication (ISAC) uses the same software and hardware resources to achieve both communication and sensing functionalities. Thus, it stands as one of the core technologies of 6G and has garnered significant attention in recent years. In ISAC systems, a variety of machine learning models are trained to analyze and identify signal patterns, thereby ensuring reliable sensing and communications. However, considering factors such as communication rates, costs, and privacy, collecting sufficient training data from various ISAC scenarios for these models is impractical. Hence, this paper introduces a generative AI (GenAI) enabled robust data augmentation scheme. The scheme first employs a conditioned diffusion model trained on a limited amount of collected CSI data to generate new samples, thereby expanding the sample quantity. Building on this, the scheme further utilizes another diffusion model to enhance the sample quality, thereby facilitating the data augmentation in scenarios where the original sensing data is insufficient and unevenly distributed. Moreover, we propose a novel algorithm to estimate the acceleration and jerk of signal propagation path length changes from CSI. We then use the proposed scheme to enhance the estimated parameters and detect the number of targets based on the enhanced data. The evaluation reveals that our scheme improves the detection performance by up to 70%, demonstrating reliability and robustness, which supports the deployment and practical use of the ISAC network.